Skip to content

Conversation

JonasKunz
Copy link
Contributor

Adds a utility function for estimating the number of values less than (or optional less-or-equal to) a given value in an exponential histogram. Just like the percentile algorithm, this algorithm assumes that all values in a bucket lie on the point of least relative error within that bucket.

This is required for #135625 in order to implement the range and percentile_ranks aggregations.

I ran the included randomized test on repeat locally to ensure it works correctly.

@elasticsearchmachine elasticsearchmachine added v9.2.0 external-contributor Pull request authored by a developer outside the Elasticsearch team needs:triage Requires assignment of a team area label labels Sep 30, 2025
@JonasKunz JonasKunz added :StorageEngine/Mapping The storage related side of mappings >non-issue and removed needs:triage Requires assignment of a team area label labels Sep 30, 2025
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-storage-engine (Team:StorageEngine)

Comment on lines +100 to +104
double bucketMidpoint = ExponentialScaleUtils.getPointOfLeastRelativeError(buckets.peekIndex(), buckets.scale());
bucketMidpoint = Math.min(bucketMidpoint, maxValue);
if (bucketMidpoint < value || (inclusive && bucketMidpoint == value)) {
rank += buckets.peekCount();
buckets.advance();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For buckets where the value is between the lower and the upper boundary, I'm wondering whether we should add a count proportional to where the value falls into the bucket. It seems like that could increase the accuracy of the estimate.

In other words, we'd increment the rank by (value - lowerBound) / (upperBound - lowerBound) * count. We can have an optimized special case for value > upperBound where we increment by count.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the algorithm currently assumes that all values in a bucket lie on the point of least relative error, just like for the percentiles algorithm. This ensures that we minimize the relative error: we minimize the relative error of percentile( rank(someValue) / valueCount), meaning that the returned percentile is as close as possible to someValue.

If we now change the assumption of how values are distributed in a bucket, I think we'd need to do the same for the percentiles algorithm. While this would smoothen the values, it would also increase the worst-case relative error.

Also changing this assumption would probably also mean that we should get rid of upscaling in the exponential histogram merging algorithm: The upscaling there happens to make sure that misconfigured SDKs (e.g. way too low bucket count) don't drag down the accuracy of the overall aggregation.
While the upscaling barely moves the point of least relative error of buckets, it greatly reduces their size.

So with your proposed change this can lead to the weir behaviour where the rank of a given value shifts by a large margin pre and post merging of histograms.

So I'd propose to stay with the "mathematically most correct" way of assuming that all points in a bucket lie on a single point. In practice buckets should be small enough that this is not really noticeable.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good points. Definitely agree that we want to keep the percentile ranks implementation in sync with the percentile implementation. Is there a specific percentiles implementation suggested by OTel that uses midpoints?

Maybe add some commentary why we're using midpoints rather than interpolation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is nothing in Otel, but this implementation is what is used in the DDSketch and UDDSketch papers, as they provide proofs for the worst-case relative error.

Prometheus does this differently for their native histograms (which are actually exponential histograms):

The worst case is an estimation at one end of a bucket where the actual value is at the other end of the bucket. Therefore, the maximum possible error is the whole width of a bucket. Not doing any interpolation and using some fixed midpoint within a bucket (for example the arithmetic mean or even the harmonic mean) would minimize the maximum possible error (which would then be half of the bucket width in case of the arithmetic mean), but in practice, the linear interpolation yields an error that is lower on average. Since the interpolation has worked well over many years of classic histogram usage, interpolation is also applied for native histograms.

Therefore, PromQL uses exponential extrapolation for the standard schemas, which models the assumption that dividing a bucket into two when increasing the schema number by one (i.e. doubling the resolution) will on average see similar populations in both new buckets. A more detailed explanation can be found in the PR implementing the interpolation method.

(Source)

So in order words, they assume an exponential distribution within the bucket (fewer values on the border towards zero, more on the further away border). We could adopt that approach, which means we would have to drop the upscaling and make converting explicit-bucket histograms more expensive and inaccurate.

I also noticed after thinking about it further that what I said above is wrong:

we minimize the relative error of percentile( rank(someValue) / valueCount), meaning that the returned percentile is as close as possible to someValue.
If we now change the assumption of how values are distributed in a bucket, I think we'd need to do the same for the percentiles algorithm.

It doesn't matter if we return the rank of the first or last element within a bucket, the resulting percentile would be the same with our current algorithm.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. I'd say let's leave it as-is for now and add an issue to re-think midpoint vs interpolation. Should we decide to switch the algorithm, it should be done consistently both for percentile rank and percentile. It's probably also a matter of how strictly we want to be compliant with prometheus and if we actually want convert explicit bounds histograms to exponential histograms long-term, or whether we want to have a dedicated type for it.

@JonasKunz JonasKunz self-assigned this Oct 8, 2025
@JonasKunz JonasKunz merged commit ebd868b into elastic:main Oct 8, 2025
34 checks passed
@JonasKunz JonasKunz deleted the exp-histogram-rank branch October 8, 2025 13:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

external-contributor Pull request authored by a developer outside the Elasticsearch team >non-issue :StorageEngine/Mapping The storage related side of mappings Team:StorageEngine v9.3.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants